Persistent Homology Analysis of Biomolecular Data
نویسنده
چکیده
Technological advances in the past few decades have fueled the exponential growth of “omic” data in biology. Understanding the rules of life from existing omic data sets, which offer unprecedented opportunities for mathematicians, remains an important mission of the field. Biomolecular structurefunction relationship is a major rule of life, and recognizing this relationship is the holy grail of biophysics and a central issue in experimental biology. Geometric modeling is vital to the comprehension of biomolecular structurefunction relationships. It also bridges the gap between biological data and theoretical models, such as quantum mechanics, molecular mechanics, statistical mechanics, thermodynamics, and multiscale models. However, geometry-based models are frequently inundated with too much structural detail and thus often computationally intractable. Topology provides the ultimate abstraction of geometric complexity by concerning only the connectivity of different components in a space and characterizing independent entities, rings, and higher-dimensional faces of the space in terms of topological invariants or Betti numbers. To study topological invariants in a discrete data set—like atoms in a biomolecule—algebraic topology utilizes simplicial complexes under various settings, such as the Vietoris-Rips complex, Čech complex, or alpha complex. Specifically, a 0-simplex is a vertex, a 1-simplex an edge, a 2-simplex a triangle, and a 3-simplex a tetrahedron, as illustrated in Figure 1. Algebraic groups built on these simplicial complexes are used in simplicial homology to systematically compute Betti numbers for a given data set [7]. Nevertheless, traditional topology and homology are truly free of metrics or coordinates and thus keep too little geometric information to be practically useful for biomolecules. Persistent homology, a new branch of algebraic topology, embeds multiscale geometric information into topological invariants to achieve an interplay between geometry and topology [14]. It creates a variety of topological spaces of a given object by varying a filtration parameter, such as the radius of balls or the level set of a real-valued function. As a result, persistent homology can capture topological features continuously over a range of spatial scales, and the resulting analysis is often visualized by barcodes [6] or persistence diagrams [5]. As such, the changes of topological invariants over scales are recorded by the “birth,” “death,” and “persistence” of barcodes over filtration. Persistent homology has been applied to a variety of domains, including image/ signal analysis, chaotic dynamics, sensor networks, complex networks, shape recognition, and computational biology [13]. For nanoand biomolecules, persistent homology enables a quantitative topological analysis—which reveals biomolecular “topology-function relationships”—via topological fingerprints (TFs) [9, 11]. Contrary to popular belief, short-lived topological events are not noise, but rather part of TFs; they play a valuable role in the quantitative topological analysis of protein folding stability [9] and fullerene curvature energy [8]. Differential geometry has been utilized to derive partial differential equation-based persistence for biomolecules [8]. Multidimensional persistence induced by a multiresolution analysis [12] is particularly useful for resolving ill-posed inverse problems in cryo-electron microscopy structure determination [10]. TFs provide biomolecules with a systematic and unique representation that cannot be literally cast into traditional physical interpretation. Fortunately, this representation is ideally suited for machine learning (particularly deep learning), which captures nonlinear and high-order interactions among features in sufficiently large and intrinsically complex data sets. One of the first integrations of machine learning and TFs offered encouraging classification of tens of thousands of proteins involving hundreds of tasks [4]. However, persistent homology neglects chemical and biological information during topological simplification and is thus not as competitive as geometry or physics-based representation in quantitative predictions. Element-specific persistent homology, or multicomponent persistent homology built on colored biomolecular networks, has been introduced to retain chemical and biological information during topological abstraction [2]. This approach enciphers biological properties— such as hydrogen bonds, van der Waals interactions, hydrophilicity, and hydrophobicity—into topological invariants, rendering a potentially revolutionary representation for biomolecules [1, 3]. Rational drug design is an imperative life science problem that ultimately tests our understanding of biological systems. Designing efficient drugs to cure diseases is one of the most challenging tasks in the biological sciences. Multicomponent persistent homology plays a crucial role in hot-spot prediction, drug-binding pose analysis, binding affinity prediction, structure optimization, toxicity analysis, and pharmacokinetic simulation. For example, the integration of machine learning with multiscale weighted colored graphs and multicomponent persistent homology provided the best free energy ranking for Set 1 (Stage 2) in D3R Grand Challenge 2, a worldwide competition in computer-aided drug design.1
منابع مشابه
Multidimensional persistence in biomolecular data
Persistent homology has emerged as a popular technique for the topological simplification of big data, including biomolecular data. Multidimensional persistence bears considerable promise to bridge the gap between geometry and topology. However, its practical and robust construction has been a challenge. We introduce two families of multidimensional persistence, namely pseudomultidimensional pe...
متن کاملA review of geometric, topological and graph theory apparatuses for the modeling and analysis of biomolecular data
Geometric, topological and graph theory modeling and analysis of biomolecules are of essential importance in the conceptualization of molecular structure, function, dynamics, and transport. On the one hand, geometric modeling provides molecular surface and structural representation, and offers the basis for molecular visualization, which is crucial for the understanding of molecular structure a...
متن کاملA quantitative structure comparison with persistent similarity
Biomolecular structure comparison not only reveals evolutionary relationships, but also sheds light on biological functional properties. However, traditional definitions of structure or sequence similarity always involve superposition or alignment and are computationally inefficient. In this paper, I propose a new method called persistent similarity, which is based on a newly-invented method in...
متن کاملRepresentability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and...
متن کاملPersistent topology for cryo-EM data analysis.
In this work, we introduce persistent homology for the analysis of cryo-electron microscopy (cryo-EM) density maps. We identify the topological fingerprint or topological signature of noise, which is widespread in cryo-EM data. For low signal-to-noise ratio (SNR) volumetric data, intrinsic topological features of biomolecular structures are indistinguishable from noise. To remove noise, we empl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017